Scalable Distributed Query Processing in Parallel Main-Memory Database Systems

نویسنده

  • Wolf Rödiger
چکیده

The continuous increase in compute speed and main-memory capacity of modern servers triggered the development of a new generation of in-memory database systems. These systems completely rewrote the traditional database architecture to use main memory as primary storage. Discarding several now obsolete abstractions of disk-based database systems enabled unprecedented query performance on a single server. However, network communication slows down queries as soon as multiple servers are involved. The result is a significant performance gap between local and distributed query processing. Still, a scale out to a cluster becomes inevitable when the workload exceeds the capacity of a single server. This thesis seeks to further the state-of-the-art of distributed query processing in parallel main-memory database systems by addressing the performance barrier introduced by network communication. Thus, instead of concentrating on an isolated problem, we design a novel distributed query engine that adapts to the available network bandwidth as well as unexpected workload characteristics that hinder scalability. It exploits locality to speed up query processing over commodity networks and implements a novel parallelism model to fully leverage modern high-speed interconnects. We prove the feasibility of our design with a prototypical implementation for the high-performance in-memory database system HyPer. Using redo log multicasting and global transaction-consistent snapshots, the engine further enables query processing on fresh transactional data. An extensive evaluation with the renowned TPC-H analytical benchmark demonstrates that HyPer with our novel distributed query engine not only outperforms competing parallel database systems but also scales its query performance with the cluster size.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

AMOS-SDDS: A Scalable Distributed Data Manager for Windows Multicomputers

Known parallel DBMS offer at present only static partitioning schemes. Adding a storage node is then a cumbersome operation that typically requires the manual data redistribution. We present an architecture termed AMOS-SDDS for a share-nothing multicomputer. We have coupled a high-performance main-memory DBMS AMOS-II and a manager of Scalable Distributed Data Structures (SDDS) into a scalable d...

متن کامل

Making XML Database Systems Scalable to Computer Resources and Data Volumes

Increasing use of XML has emphasized the need for scalable database systems that are capable of handling a large amount of XML data efficiently. This study explores effective methods for making a scalable XML database system in the following aspects: (a) scalability to data volumes, (b) scalable XML processing with a shared-nothing PC cluster, and (c) scalable database processing on shared-memo...

متن کامل

Effective Spatial Data Partitioning for Scalable Query Processing

Recently, MapReduce based spatial query systems have emerged as a cost effective and scalable solution to large scale spatial data processing and analytics. MapReduce based systems achieve massive scalability by partitioning the data and running query tasks on those partitions in parallel. Therefore, effective data partitioning is critical for task parallelization, load balancing, and directly ...

متن کامل

Tuning a Parallel Database Algorithm on a Shared-memory Multiprocessor

Database query processing can benefit significantly from parallelism. Parallel database algorithms combine substantial CPU and I/O activity, memory requirements, and massive data exchange between processes, all of which must he considered to obtain optimal performance. Since parallel external sorting is a very typical example, we have focused on sorting to tune Volcano, a new query processing s...

متن کامل

Distributed Graph Layout for Scalable Small-world Network Analysis

The in-memory graph layout or organization has a considerable impact on the time and energy efficiency of distributed memory graph computations. It affects memory locality, inter-task load balance, communication time, and overall memory utilization. Graph layout could refer to partitioning or replication of vertex and edge arrays, selective replication of data structures that hold meta-data, an...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016